arXiv:1501.07686vl [cs.FL] 30 Jan 2015 


Construction of rational expression from tree automata using a 
generalization of Arden’s Lemma 


Younes Guellouma^’^, Ludovic Mignot^, Hadda Cherroun^’^, and Djelloul Ziadi^’^ 


^ Laboratoire LIM, Universite Amar Telidji, Laghouat, Algerie 
{y. g;uellouma,hadda_cherroun}(§mail. lagh-univ.dz 
^ LITIS, Universite de Rouen, 76801 Saint-Etienne du Rouvray Cedex, France 
{ludovic.mignot,dj elloul.ziadilSuniv-rouen.fr 
® supported by the MESRS - Algeria under Project 8/U03/7015. 


Abstract. Arden’s Lemma is a classical result in language theory allowing the computation of a rational 
expression denoting the language recognized by a finite string automaton. In this paper we generalize this 
important lemma to the rational tree languages. Moreover, we propose also a construction of a rational 
tree expression which denotes the accepted tree language of a finite tree automaton. 
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1 Introduction 

Trees are natural structures used in many fields in computer sciences like XML |15) . indexing, natural 
language processing, code generation for compilers, term rewriting [6], cryptography [7] etc. This large 
use of this structure leads to concider the theoretical basics of a such notion. 

In fact, in many cases, the problem of trees blow-up causes difficulties of storage and representation 
of this large amount of data. To outcome this problem, many solutions persist. Among them, the use 
of tree automata and rational tree expressions as compact and finite structures that recognize and 
represent infinite tree sets. 

As a part of the formal language theory, trees are considered as a generalization of strings. Indeed 
in the late of 1960s Enu], many researches generalize strings to trees and many notions appeared like 
tree languages, tree automata, rational tree expressions, tree grammars, etc. 

Since tree automata are beneficial in an acceptance point of view and the rational expressions in a 
descriptive one, an equivalence between the two representations must be resolved. Fortunately, Kleene 
result HH states this equivalence between the accepted language of tree automata and the language 
denoted by rational expressions. 

Kleene theorem proves that the set of languages denoted by all rational expressions over the ranked 
alphabet E noted Rat{E) and the set of all recognized languages over E noted Rec{E) are equivalent. 
This can be checked also by verifying the two inclusions Rat{E) C Rec{E) and Rec(E) C Rat{E') 
where E C E'. In other words, any tree language is recognized by some automaton if and only if it is 
denoted by some rational expression. Thus two constructions can be pulled up. 

From a rational expression to tree automata, several techniques exist. First, Kuske et Meinecke |5] 
generalize the notion of languages partial derivation [1] from strings to trees and propose a tree equation 
automaton which is constructed from a derivation of a linearized version of rational expressions. They 
use the ZPC structure jl] to reach best complexity. After that, Mignot et al. m propose an efficient 
algorithm to compute this generalized tree equation automata. Next, Laugerotte et al. [9] generalize 
position automata to trees. Finally, the morphic links between these constructions have been defined 
in [ig. 

In this paper, we propose a construction of the second way of Kleene Theorem, the passage from a 
tree automaton to its rational tree expression. For this reason we propose a generalization of Arden’s 
Lemma for strings to trees. The complexity of a such construction is exponential. 

Section [2] recalls some preliminaries and basic properties. We generalize the notion of equation 
system in Section [3J Next the generalization of Arden’s lemma to trees and its proof is given in Section 
m leading to the computation of some solutions for particular recursive systems. Finally, we show how 
to compute a rational expression denoting the language recognized by a tree automaton in Section [S] 




2 Preliminaries and Basic Properties 

Let E = IJn>o be a graded alphabet. A tree t over E is inductively defined by t = /(ti,... ,tn) 
with / € En and ti,... ,tn any n trees over E. A tree language is a subset of T{E). The subtrees set 
St{t) of a tree t = f{ti,... ,tn) is defined by St(t) = {t} U IJfc=i St(tfc)- This set is extended to tree 
languages, and the subtrees set St(L) of a tree language L C T{E) is St(L) = UteLSt(t). The height 
of a tree t in T{E) is defined inductively by Height(/(ti,... ,tn)) = 1 + max{Height(tj) | 1 < f < n} 
where / is a symbol in E^ and ti,... ,tn are any n trees over E. 

A finite tree automaton (FTA) over 17 is a 4-tuple A = {E,Q,Qf,A) where Q is a finite set of 
states, Qf C Q is the set of final states and A C Un>o ^ is a finite set of transitions. The 

output of A, noted S, is a function from T{E) to 2*^ inductively defined for any tree t = /(ti,... ,tn) 
by 6{t) = {q ^ Q \ 3(/, gi,..., g) G A, (VI < i < n,qi G 6{ti))}. The accepted language of A is 
L{A) = {t G r(Z')|5(f) fi Qf A 0}- The state language L{q) (also known as down language [5]) of a 
state q G Q is defined by L{q) = {t G T{E)\q G S{t)}. Obviously, 

L{A) = U LW) (1) 

q&Qf 

In the following of this paper, we consider accessible FTAs, that are FTAs any state q of which satisfies 
T(<?) A 0- Obviously, any FTA admits an equivalent accessible FTA obtained by removing the states 
the down language of which is empty. 

Given a symbol c in Eq, the c-product is the operation -c defined for any tree t in T(E) and for any 
tree language L by 


( L if t = c, 

t -cL = < {d} if t = d G Eo \ {c}, (2) 

[ f{ti ■cL,...,tn-c L) otherwise if t = f{ti,... ,tn) 

This c-product is extended for any two tree languages L and L' by L-cL' = IJiGL t'cL'■ In the following 
of this paper, we use some equivalences over expressions using some properties of the c-product. Let 
us state these properties of the c-product. As it is the case of catenation product in the string case, it 
distributes over the union: 

Lemma 1. Let Li, L 2 and L 3 be three tree languages over E. Let c be a symbol in Eq. Then: 


(Li U L2) -c L3 — (Li -c L3) U (L2 -c L3) 


Proof. Let t be a tree in T{E). Then: 


t G (Li U L2) -c L3 3m € Li U L2, Be G L3, t = u -cV 

4 ^ (3m G Li, 3m G L3, t = u -cv) V (3m G L2, 3m G L3, t = u -cv) 
t G (Li -c L3) U (L2 'c L3) 


□ 


Another common property with the catenation product is that any operator -c is associative: 

Lemma 2. Let t and t' be any two trees in T[E), let L be a tree language over E and let c be a symbol 
in Eq. Then: 


t -c it' -c L) = (t -c t') -c L 


Proof. By induction over the structure of t. 

1. Consider that t = c. Then t -c (f -c L) = t' -qL = {t -c t') -c L. 

2. Consider that t G Eq \ {c}. Then t -c (t' -c L) = t = {t -c t') -c L. 


3. Let us suppose that t = /(ti,..., tn) with n > 0. Then, following Equation ([2]): 

fih,.. .,tn) -c {t' -cL) = fih -c -cL), ...,tn-c {t' 'c L)) 

= fiiti -c t') -cL,..., {tn -c t') -c L) (Induction hypothesis) 

= ■ct',...,tn -ct') -cL 

= -ct') -cL 

□ 

Corollary 1. Let L, L' and L" he any three tree languages over a graded alphabet E and let c be a 
symbol in Eq. Then: 


L -c {L' -e L") = (L -e L') ; L" 

However, the associativity is not necessarily satisfied if the substitution symbols are different; as 
an example, (/(a, b) -a b) -bC^ /(a, b) {b -b c). Finally, the final common property is that the operation 
•c is compatible with the inclusion: 

Lemma 3. Let t be a tree over E, and let L <Z L' be two tree languages over E. Then: 

t -c L C t -c L' 


Proof. By induction over the structure of t. 

1. Consider that t = c. Then c -c L = L C L' = c -c L'. 

2. Consider that t £ Eq \ {c}. Then t -c L = {t} = t -c L'. 

3. Let us suppose that t = f{ti,... ,tn). 

Then 

f (ti, ... ,tn) 'c L — f (fl 'c L, . . . ,tji 'c L') 


By induction hypothesis. 


VI < j < n, tj -c L C tj -c L' 


Therefore, 


f{ti ■cL,...,tn-cL) C fih -cL',..., tn -c L') = t 'c L' 


□ 


Corollary 2. Let L, L' C L” be any three tree languages over E and let c be a symbol in Eq. Then: 


L-cL' cL-c L" 


The first property not shared with the classical catenation product is that the c-product may 
distribute over other products: 

Lemma 4. Let ti, t 2 and t^ be any three trees in T{E). Let a and b be two distinct symbols in Eq 
such that a does not appear in t^. Then: 

(il -a ^ 2 ) ‘b h = {h -b ts) -a {t2 ‘b h) 


Proof. By induction over ti. 
1. If ti = o, then 


{h -a ^ 2 ) 'b h — t2 ‘b t3 — {h -fe h) -a (^2 'b ^s) 


2. If ti = b, then 


{h -a ^ 2 ) -b h — ts — (tl -b ts) -a (^2 'b ^s) 

3. If = c € i7o \ {o, b}, then 

{h -a ^2) -b ts = ti = {ti -b ts) -a (t2 'b h) 

4. If ti = f{ui ,..., Un) with n > 0, then, following Equation ([2]): 

(^1 'a ^ 2 ) 'b ^3 — if (^1 'a ^ 2 ) • • • ) '^n ‘a ^ 2 )) ‘b ^3 

— /((^1 'a ^2) 'b ^3) ■ ■ ■ ) (^n 'a ^2) 'b ^3) 

= /((^^l -b ^3) -a (^2 -b ^3), ..., (tin -b ^3) -a (^2 'b ^3)) (Induction Hypothesis) 

— f (^1 'b ^3) ■ ■ ■ ) '^n 'b ^3) 'a (^2 'b ^3) 

— (/(^1) • • • ) ^n) 'b ^3) 'a (^2 'b ^3) 

□ 

Corollary 3. Let Li, L 2 and L 3 6 e any t/iree tree languages over U. Let a and b be two distinct symbols 
in Uq such that L 3 C T{U \ {a}). Then: 

(Ll -a L2) -b L3 = (Li •{, L3) -a (L2 'b L3) 

In some particular cases, two products commute: 

Lemma 5. Let ti, t 2 and t^ be any three trees in T(Z’). Let a and b be two distinct symbols in Uq 

such that a does not appear in t^ and such that b does not appear in t 2 . Then: 

{h -a ^2) 'b ts = (ti -b t^) -a t2 

Proof. By induction over ti. 

1. If ti = a, then 

(H -a ^2) -b h = (o -a h) -b h = h 'b ^3 
= t 2 = a ■at 2 

= (o -b ts) -a t2 = {tl -b ts) -a h 

2. If tl = b, then 

{tl -a ^2) -b ts = {b -a t2) 'bts = b -fo ts 
= ^3 = ^3 'a ^2 

= {b -b ts) -a t2 = {tl -b ts) -a ^ 2 ) 

3. If tl = c € i7o \ {o, 6 }, then 

(il 'a ^2) -b ^3 = (c -a ^2) 'bts = C -fc ts 
— C — C 'a t2 

= (c -b ^ 3 ) 'a t2 

4. If tl = /(ni ,... ,Un) then, following Equation Q: 

(H 'a ^2) 'b ^3 — (/(^l 'a t2i ■ ■ ■ 1 Un 'a ^2)) 'b ^3 

— /((^1 'a ^2) 'b ^3) ■ ■ ■ ) 'a ^2) 'b ^3) 

— /((^1 'b ^3) 'a t2i ■ ■ • 1 {Un 'b ^3) 'a ^2) 

— /(^1 'b ^3) ■ ■ ■ ) 'b ^3) 'a ^2 

= •bi3) -a t2 


(Induction Hypothesis) 


□ 


The iterated c-product is the operation recursively defined for any integer n by: 

= {c} (3) 

j^n+l,c ^ ^n,c u ^ j^n,c ^4) 

The c-closure is the operation dehned by L*= = Notice that, unlike the string case, the 

products may commute with the closure in some cases: 

Lemma 6. Let Li and L 2 be any two tree languages over E. Let a and b be two distinct symbols in 
Eq such that L 2 CT{E \ {a}). Then: 

■bL2 = {Li -b L2)*“ 

Proof. Let us show by recurrence over the integer n that -b L 2 = (Li -b L 2 )"'’“- 

1. If n = 0, then, according to Equation ([3])): 

-b L 2 = {a} = (Li -6 L2)°’“ 

2. If n > 0, then, following Equation (U])): 


(L^ -a LiULr) -fe L2 


(L^ -a Li).bL 2 U(Ln -fe L2 

(Lemma [T]) 

{{Ll'^-b L2) -a {Li ■bL2))U{L^’^)-b L2 

(Corollary [3]) 

{{Li -b {Li -b L 2 ))U{Li L 2 r“ 

(Li -fe L 2 r+i’“ 

(Induction Hypothesis) 


As a direct consequence, -b L 2 = (Li -b ^ 2 )*“. □ 

A rational expression E over E is inductively dehned by: 

£' = 0, E = f{Ei,..., En), 

E = El + E 2 , E = El -c E 2 , E = 

where / is any symbol in E^, c is any symbol in Eq and Ei,... ,En are any n rational expressions. 
The language denoted by E is the tree language L{E) inductively dehned by: 

L(0) = 0, L{f{Ei, ...,En)) = f{L{Ei ),..., L(£„)), 

L{Ei+E2) = L{Ei)[JL{E2), L{EiE 2 ) = L{Ei)L{E2), L(£*'=) = (L(£i))*^ 

where / is any symbol in E^, c is any symbol in Eq and Ei,... ,£„ are any n rational expressions. 
In the following of this paper, we consider that rational expressions include some variables. Let X = 
{xi,... ,Xk} be a set of k variables. A rational expression E over {E,X) is inductively dehned by: 

£ = 0, E = Xj, £ =/(£i,... ,£n), 

£ = £1 + £2, £ = £1 'c £2, £ = £1'" 

where / is any symbol in E^, c is any symbol in £o) 1 ^ ^ ^ is any integer and £ 1 ,... ,£„ are 

any n rational expressions over {E,X). The language denoted by an expression with variables needs 
a context to be computed: indeed, any variable has to be evaluated according to a tree language. Let 
C = (Li,...,Lfc) be a A:-tuple of tree languages over E. The ^-language denoted by £ is the tree 
language Lc{E) inductively dehned by: 

L£(0) = 0, Lc{xj) = Lj, 

Leif {El, ...,£„)) = f{Lc{Ei),..., Lc{En)), 

Lc{Ei + £2) = Lc{Ei) U Lc{E2) 

Lc{Ei -c E 2 ) = Lc{Ei) -e Lc{E 2), Lc{El^) = (L£(£i))*^ 


where / is any symbol in c is any symbol in lio, 1 < j < A: is any integer and Ei,..., are any 
n rational expressions over {E,X). Two rational expressions E and E with variables are equivalent, 
denoted hy E ^ F, if for any tuple C of languages over E, Ljr,{E) = Lc{F). Let E C E. Two 
rational expressions E and F with variables are E-equivalent, denoted hy E E,\i for any tuple L 
of languages over E, Lc{E) = Ljr,{F). By definition, 

E ^ F ^ E ^p F (5) 

Notice that any expression over {E,X) is also an expression over E \J X. However, two equivalent 
rational expressions over {E,X) are not necessarily equivalent as rational expressions over E\J X. As 
an example, x-ab is equivalent to x as expressions over {a, b, x}, but not as expressions over ({a, b}, {x}): 

L{x -a b) = {x} = L(x) 

L{a}{x -a b) = {6} / L{a}{x) = {o} 

In the following, we denote by Ex-t^p' the expression obtained by substituting any symbol x by the 
expression E' in the expression E. Obviously, this transformation is inductively defined as follows: 

dx-i^E' = a Ox^E' = 0 

Ux-^E' = y Xx^E' = E' 

{f{Ei , . . . , Efi)^x<^E' — / {{Ei)x->^E' } ■ ■ ■ 1 {En)x<^E') 

{El + E2)x^E' = {Ei)x^E' + {E2)x-^E' {El -c E2 )x^E' = {Ei)x-i^E' 'c {E2)x-^E' 

{EI%^E' = {{El)x^E'r^ 

where a is any symbol m Eq, x ^ y are two variables in X, / is any symbol in E^, c is any symbol 
in Eq and Ei,...,En are any n rational expressions over {E,X). This transformation preserves the 
language in the following case: 

Lemma 7. Let E be an expression over an alphabet E and over a set X = {xi,... ,Xn} of variables. 
Let F be a rational expression over {E, X). Let Xj be a variable in X. Let C = {Li,..., Ln) be a n-uple 
of tree languages such that Lj = Lc{F). Then: 

Lc{{E)x,^e) = Lc{E) 

Proof. By induction over the structure of E. 

1. If LI G {a,y,0} with a € Eq and y Xj, {E)xji^F = E. 

2. If LI = Xj, then {E)xj^F = E. Therefore 

Lc{{E)x,^f) = Lc{F) = Lj 

= Lc{xj) = Lc{E) 


3. If LI = /(-El,... ,En), with / G E^, k > 0 then: 


Lc{{E)x^^f) = Lc{f{{Ei)x^^F, . . . , {En)x,^F)) 

= f{Lc{{Ei)x^^F ),..., Lc{{En)x,^F)) 

= f{L£{Ei),..., Lc{En)) (Induction Hypothesis) 

= Lc{f{Ei,...,En)) 


4. li E = El -\- E 2 , then 

Lc{{El + E2)xj*^F) = Lc{{Ei)xj^F + {E2)xj^F) 

= Lc{{Ei)xj^F) U Lc{{E2)xj*^F)) 
= Lc{Ei) U Lc{E 2 ) 

= Ljr{Ei + E 2 ) 


(Induction Hypothesis) 



5. li E = El 'c E2, then 


Lci{Ei -c E2)xj<-F) = Lc{iEi)xj^F -c iE2)xj<-F) 

= Lc{{Ei)xj^f) -c Lc{{E2)xj-i^F)) 

= Lc{Ei) ■cLc{E 2 ) (Induction Hypothesis) 

= Lc{Ei -c E2) 

6 . liE = El% then 

Lc{{EI%^^f) = {Lc{{Ei)x^^F)r^ 

= {Lc{Ei))*’^ (Induction Hypothesis) 

= Lc{El^) 

□ 


In the following, we denote by op(ii') the set of the operators that appear in a rational expression 
E. The previous substitution can be used in order to factorize an expression w.r.t. a variable. However, 
this operation does not preserve the equivalence; e.g. 

L{b}{x -b c) = {c} / L{b}((a -b c) -a x) = {b} 

Nevertheless, this operation preserves the language if it is based on a restricted alphabet: 

Proposition 1. Let E be a rational expression over a graded alphabet E and over a set X of variables. 
Let X be a variable in X. Let E G E be the subset defined by E = {b (z Eq \ } D op(ii') 7 ^ 0}. Let 

a be a symbol not in E. Then: 


E ~i;\r (E) x-^a ‘a ^ 

Proof. By induction over the structure of E. 

1. li E = X, then since x ~i7u{a} o -a x, it holds from Equation ([5]) that E ^j]\r {E)x^a 'a x. 

2. li E & {0} U T U X \ {x}, since x does not appear in E, it holds E = Ex^^a- 

3. li E = f (El,... ,En), then 

if {El, ... , En))x<—a ‘a X = f ((Ei) (En) x-^a) 'a ^ 

~ f({Ei)x^^a -aX,..., {En)x^a 'a x) (Equation ([2])) 

^E\r fiEi,..., En) (Induction hypothesis) 

4. li E = El E E 2 , then 

(El + E2)x^a ■aX = ((Ei) x<^a + {E2) x<^a) 'a ^ 

~ ((Ei)x^a) -aXE ((E 2 )x^a) 'a X (Lemma [I]) 

^E\F El E E 2 (Induction hypothesis) 

5. li E = El -c E 2 , then 


(El -c E 2 )x^a ■aX= ((Ell) x-i^a ‘c {E2) x-^a) 'a ^ 

r^E {{(El) x-i^a )-a X) ■A{{E 2 ) x-i^a) 'a 

~i:\r El -c E 2 

6 . If .E = Efi, then 

(Efi)x;.^a ■aX= ({Ei)x^a)*" a X 

{i{El)x^a) -aX)*^ 

^E\r Efi 


(Corollary [3|) 
(Induction hypothesis) 


(Lemma [ 6 |) 
(Induction hypothesis) 


□ 


3 Equations Systems for Tree Languages 


Let U be an alphabet and E = {Ei,... ,E„} be a set of n variables. An equation over (A’,E) is an 
expression Ej = Fj^ where 1 < j < n is any integer and Fj is a rational expression over (i7,E). An 
equation system over {E, E) is a set X = {Ej = Fj\l<j<n}oin equations. Let C = (Li,..., L„) 
be a n-tuple of tree languages. The tuple £ is a solution for an equation (Ej = Fj) if Lj = Lc{Fj). 
The tuple £ is a solution for X if for any equation (Ej = Fj) in X, C is a solution of (Ej = Fj). 

Example 1. Let us define the equation system X as follows: 

El =/(Ei,Ei) + /(E2,E4) 

^ ^ E 2 = b + f (E 2 , E 4 ) 

E3 = o + /i(E4) 

E4 = a + /i(E3) 

The tuple (0,0, 0,0) is a solution for the equation Ei = Ti, but not of the system X. 

Two systems over the same variables are equivalent if they admit the same solutions. Notice that a 
system does not necessarily admit a unique solution. As an example, any language is a solution of the 
system Ei = Ei. Obviously, 

Proposition 2. If X only contains equations E^ = with F^ a rational expression without variables, 
then {L{Fi ),..., L{Fn)) is the unique solution of X. 

Let us now define the operation of substitution, computing an equivalent system. 

Definition 1. Let X = {(Ej = Fj) \ 1 < j < n} be an equation system. The substitution of (E^ = F^) 
in X is the system X^ = {E^ = F^} U {Ej = {Fj)-^,^^—p^ \ j k hi < j < n}. 

As a direct consequence of Lemma [71 

Proposition 3. Let X = {(Ej =Fj)\l<j<n} he an equation system. Let E^ = F^ be an equation 
in X. Let C be a solution of X. Then for any integer 1 < j,k < n with j 7 ^ k, 


C is a solution o/Ej = {Fj)^^i-F^. 

And following Proposition [3l 

Proposition 4. Let X he an equation system over n variables. Let k < n be an integer. Then: 

X and X^ are equivalent. 


Example 2. Let us consider the system X of Example [H Then: 


A" 


El 

E 2 

E 3 

E 4 


= /(Ei,Ei) + /(E 2 ,a + /i(E 3 )) 
= 6 + /(E2,a + ME3)) 

= a + h{a + hfEs)) 

= a + /i(E3) 


Let us determine a particular case that can be solved by successive substitutions. Let X = {(Ej = 
Fj) I 1 < J < n} be an equation system. The relation <;r is defined for any two variables Ej and E^ 
by 


Ej hh Ej appears in F^ 

The relation is defined as the transitive closure of <;^. In the case where E^ <;r , the equation 

Efc = Tfc is said to be recursive. Let us say that a system is recursive if there exists two symbols Ej 
and Efc such that Ej hx E^ and E^ :<x Ej. If a system is not recursive, it can be solved by successive 
substitutions. If E^ is a variable that does not appear in any right side of an equation of A, we denote 
by A \ (Efc = Fk) the system obtained by removing E^ = F^ of A, and by reindexing any symbol Ej 
with j > k into Ej_i. 




Lemma 8. Let X = {(Ej = Fj) \ 1 < j < n} be an equation system over a graded alphabet F and over 
n variables {Ei,... ,E„}. Let E^ = be an equation in X sueh that E^ = F^ is not recursive. Then 
for any n — 1-tuple Z = (Li,..., Lfc_i, L^+i,..., Ln), the two following conditions are equivalent: 

1. (Li,...,Lfc_i,L^(Ffc),Lfc+i,...,L„) is a solution of X 

2. (Li,... ,Lfe_i,Lfc+i, ...,Ln) is a solution of X^ \ {E^ = Fk} 

Proof. Let C = (Li,..., iz(Efc), L^+i,..., L„) and C = (Li,..., Lfc-i,• • •,Obvi¬ 
ously, /I is a solution for the (non recursive) equation E^ = F^. From Proposition 21 

/I is a solution oi X ^ C a. solution of X^ 

Consequently, for any integer j ^ k, 

£ is a solution of Ej = Fj 4^ C is a solution of Ej = 

Moreover, by definition of £', for any integer j k, 

£ is a solution of Ej = £' is a solution of E^ = 

£Ms a solution of X^ \ {E^ = F^} 


□ 

As a direct consequence of the previous lemma, a non-recursive system can be solved by solving a 
smaller system, obtained by substitution: 

Corollary 4. Let X = {(Ej = Fj) \ 1 < j < n} be an equation system over a graded alphabet E and 
over n variables {Ei,... ,E„}. Let E^ = Fj. be an equation in X such that F^ is a rational expression. 
Then for any n — 1-tuple (Li,..., Lk-i,Lk+i,..., Ln), the two following conditions are equivalent: 

1. (£i ,... ,Lk_i,L{Fk),Lk+i, ...,Ln) is a solution of X 

2. (£i,... ,Lfc_i,£fc+i,... ,Ln) is a solution of X^ \ {Efc = Fk} 

Moreover, such a system admits a unique solution: 

Proposition 5. Let X = {(Ej = Fj) \ 1 < j < n} be an equation system that is not recursive over a 
graded alphabet E and over variables {Ei,... ,E„}. Then 

X admits a unique solution. 

Proof. By recurrence over the cardinal of X. 

1. X = {El = £i}, then Fi is a rational expression over E (with no variable) and therefore L{Fi) is 
the unique solution of X. 

2. Since X is not recursive, there exists an equation E^ = Fk with Fk a rational expression over E 
(with no variable). Therefore, according to Corollary 21 a tuple (Li,..., Lk-i, L(Fk), Lk+i,..., Ln) 
is a solution of X if and only if (£i,..., £fc_i, L^+i,..., £„) is a solution of X^ \ {Efc = Fk}. 
By recurrence hypothesis, since X^ \ (E^ = Fk} is not recursive, it admits a unique solution 
(£i,... ,Lfc_i,£fc+i,.. .,Ln). Thus (Li,...,£fc_i, L(Ffc), Lfc+i, ...,£„) is a solution of A. Finally, 
since for any Lk 7 ^ £(Ffc),the tuple (Li,..., Lk-i,Lk, Lk+i ,..., Ln) is not a solution for Efc = Fk, 
(£ 1 ,..., Lk-i, L{Fk), Lk+i,..., Ln) is the unique solution of X. 

□ 


Example 3. Let us define the equation system y as follows: 

'Ei =/(E2,E3) + /(E2,E3) 
E2 = 6-|-/(E4, E4) 

E 3 = a -|- hfE^) 

E 4 = a-\-{f{a,b))*>>-bU 


y = 



Then 




(y)3 




El 

= /(E2,E3) + /(E2,E3) 


E 2 

= b + f{a + (/(a, b))*>> -fe a, 0 + (/(a, b))*>> a) 


E 3 

= a + h{a + {f{a, b))*>’ -b a) 


E 4 

\ 

= a + {f{a,b))*>> -b a 


El 

= /(E 2 , a + h{a + {f{a, b))*>= -b a)) + /(E 2 , a + h{a 

+ ifixb))*^ -b a)) 

E 2 

= b + f{a+ {f{a, -b a, a + (/(a, b))*^ -b a) 


E 3 

= a + h(a + (/(a, -b a) 


E 4 

\ 

= a + {f{a,b))*b -bO 


'El 

= + /(« + {fix ^))**' -6 a, a + {fix b))*>> -b a),a 

+ h{a + {f{a,b))*>> -b 0 )) 


+f{b + f{a + {fix b))**’ -b a, a + (/(a, b))*>> -b a). 

a + h{a + {f{a,b)Y*> -b a)) 

E 2 

= b + f{a+ if {a, b))*>’ -b a, 0 + (/(a, 5))**’ -b a) 


E 3 

= a + h{a + if {a, b))*>’ -b a) 


^E4 

= a + {f{a,b))*>> -bO 



4 Arden’s Lemma for Trees and Recursive Systems 

Arden’s Lemma [2] is a fundamental result in automaton theory. It gives a solution of the recursive 
language equation X = A ■ X L) B where X is an unknown language. It can be applied to compute 
a rational expression from an automaton and therefore prove the second way of Kleene theorem for 
strings. Following the same steps as in string case, we generalize this lemma to trees. 

Proposition 6. Let A and B he two tree languages over a graded alphabet X. Then A*’^ -c B is the 
smallest language in the family T of languages L over X satisfying L = A L\J B. Furthermore, if 
c ^ A, then T = {A*^^ 'c B}. 

Proof Let us set Z = A*^ -c B. 

1. Obviously, Z belongs to J-: 


A -c (A*'" -c B) Li B = {A -c A*‘^) -c B Li B from Corollary [T] 

= (A ■^A*^)-cBU{c} -cB 
= ((A •,A*=)U{c}) -cB 
= -c B 

2. Let us now show that if C belongs to X, then Z C C. To do so, let us show that for any integer 

n > 0, -c B <Z C. Since C belongs to X, then C = A-cC Li B. Therefore A^’^^ -c B = B C C 
and A -c C G C. Suppose that A”’'’ -c B C C for some integer re > 0. Therefore, from Corollary [ 2 l 
A -c (A"’’'’ -c B) C A -c (C) and from Corollary [H -c B G A -cC G C. Consequently, since for 

any integer re, -c B G C, it holds that Z = A*= -c B G C. 

3. Finally, let us show that if c ^ A, then any language Y 'm. X satishes Y G Z, implying that 
X = {Z}. Let Y ^ Z satisfying Y = A -cY Li B. Suppose that Y Z. Let t be a tree in T \ Z 
such that Height(y) is minimal. Obviously, since R C Z, t is not in B. Consequently, t belongs to 
A'cY and therefore t = ti -c t 2 with ti £ A and t 2 G Y. Since c ^ A, t\ c. Furthermore, if c does 
not appear in ti, then t = ti £ A and consequently, t £ A*‘= -c B = Z, contradicting the fact that 
t Z. Therefore c appears in ti and then Height(t 2 ) < Height(t), contradicting the minimality of 
the height of t. As a direct consequence, any language Y va. X satisfies Y G Z. Following previous 
point, since Z C T, it holds that Y = Z. 

□ 





By successive substitutions, any recursive system can be transformed into another equivalent system 
such that there exists a symbol Ej satisfying Ej <;r Ej. Let us enlighten a specific case where recursive 
equations can be solved. 

For an integer k, the k-split of an expression F over (17, {Ei,...,E^}) is the couple fc—split(F’) 
inductively defined by: 


fc—split(F) 


'(E'+E',E" + E") 

im 

S0,F) 


F — El -|- E 2 

Afe—split(Ei) = A fe—split(E 2 ) 

otherwise if E^ appears in F, 
otherwise. 


iE'2,E: 


Obviously, if k—split{F) = {F',F"), F ^ F' + F”. This tuple can be used to factorize a recursive 
equation in order to apply Arden’s Lemma. Indeed, as a direct consequence of Proposition [1] 


Proposition 7. Let X = {(Ej = Fj) \ 1 < j < n} be an equation system. Let a be a symbol not in E. 
Let 1 < k < n be an integer. Let F C E be the subset defined by F = {c (z Eq \ {•c,*'’ } H op(Ffc) 7 ^ 0}. 
Let C be a n-tuple of tree languages over the alphabet E\F. Let A:—split(F’) = [FfiFf,). Then the two 
following conditions are equivalent: 


1. C is a solution for E^ = F^, 

2. C is a solution for Ej, = 'a Efc + Fj!. 

Once an equation factorized, the Arden’s Lemma can be applied by contraction: 

Definition 2. Let A = {(Ej = Fj) \ 1 < j < n} be an equation system. Let 1 < k < n be an 
integer such that E^ = -c E^ + Ff. The contraction of (E^ = F^) in X is the system X^ = {E^ = 
(E;,)*^ -c EH)} u {E,- = Fj\jfikAl<j<n}. 

Following Proposition [6l such a contraction preserves the language: 

Proposition 8. Let X = {(Ej = Fj) \ 1 < j < n} be an equation system. Let 1 < k < n be an 
integer such that E^ = Fj^ -c E^ + Fj} . Let L = (Li,..., L„) be a n-tuple of tree languages. Then the 
two following conditions are equivalent: 


1. C is a solution of X, 

2. C is a solution of Xj-. 

Furthermore, if c is not in LciF}^) then for any language L^, 


(Li,..., Lk-i, L}, Lfc+i,..., Ln) is not a solution of X. 


Example f. Let us consider the system A 4 of Example [2l 


A" 


El 

E 2 

E 3 

E4 


= /(Ei,Ei) + /(E 2 ,a + /i(E 3 )) 
= 6 + /(E 2 ,a + /i(E3)) 

= a + h{a + /i(E 3 )) 

= a + h(Kfi) 


The 2 — split of 6 + /(E 2 , a + h{K^)) is f{x 2 , a + /i(E 3 )) ■x 2 E 2 + b, contracted in f{x 2 , a + h{K ^))*^2 b. 


However, as it was recalled in Proposition [71 the factorization that precedes a contraction does not 
necessarily produce an equivalent expression. Let us now define a sufficient property in order to detect 
solvable systems. Obviously, it is related to the symbols that appear in a product or a closure. 

The scope of an operator is its operands. An occurrence of a symbol c in Eq is said to be bounded 
if it appears in the scope or if it is the symbol of an operator -c or *=. An expression (resp. a system 
A) is said to be closed if all of the occurrences of a bounded symbol are bounded. In this case, the set 
free(A) contains the symbols of Eq that are not bounded. 

Let us first show that the closedness is preserved by substitution, factorization and contraction. 




Lemma 9. Let F and F' he two elosed expressions over i7,E such that the bounded symbols of F are 

bounded in F'. Let he a variable in E. Then: 

is closed. 

Proof. By induction over the structure of F. Let us define for any expression H, the expression G{H) = 
Let us set G = G{F). 

1. If F E ifo U {0} U E \ {Efc}, then G = F. Therefore G is closed. 

2. If F = Efc, then G = F'. Therefore G is closed. 

3. If F = /(Fi,..., F„), then G = /(G(Fi),..., G(F„)). By induction hypothesis, G(Fi),..., and 
G{En) are closed, and as a consequence so is G. 

4. If F = Fi + F 2 , then G = G{Ei) + G{E 2 ). By induction hypothesis, G{Ei) and G{E 2 ) are closed, 
and therefore so is G. 

5. If F = El ■cF 2 , then G = G{Ei) -c G{E 2 ). By induction hypothesis, G{Ei) and G{E 2 ) are closed. 
Since the bounded symbols of F are bounded in E', c is bounded in G{Ei). Consequently, G is 
closed. 

6 . If F = El^, then G = {G{Ei))*‘^. By induction hypothesis, G(Fi) is closed. Since the bounded 
symbols of F are bounded in F', c is bounded in G{Ei). Consequently, G is closed. 

□ 


As two direct consequences of Lemma [9l 

Corollary 5. Let X be an equation system over n variables. Let 1 < k < n be an integer. Then: 

is closed. 

Corollary 6. Let F he a closed expressions over F,E. Let Efc be a variable in E. Let k — split(F„) = 
{F',F"). Let a be a symbol not in E. Then: 

(F')Ej,^a -a Efc + F" is closed. 

The stability of the closedness by contraction is even easier to prove; since it is not an inductive 
transformation: 

Lemma 10. Let E = F -cF' + F” he a closed expression. Then: 

F*c pu jg Q, Qigg^d expression. 

Proof. Let E' = F*'" -c F". Suppose that E' is not closed. Either there exists an occurrence of c that 
is not bounded in F", or there exists an operator in {-a,*® } appearing in F (resp. F") such that an 
occurrence of a is not bounded in E" (resp. in F). Contradiction with the closedness of E. □ 

Corollary 7. Let X = {(Ej = Ej) \ 1 < j < n} be a elosed equation system. Let 1 < k < n be an 
integer such that E^. = F^ -c E^ + Ej!. Then: 

Xk is closed. 

Finally, let us show that a closed system can be effectively solved: we show that it admits some 
rational solutions, i.e. solutions formed by rational languages. And we give a way to compute expres¬ 
sions to denote it. In the following, we say that a n-tuple of rational expressions (Fi,... ,F„) denotes 
a rational solution (Li,..., L„) if Fj = L{Ei) for any 1 < i < n. The following example illustrates how 
to compute some rational expressions denoting a solution. 

Example 5. Let us consider the closed system X of Example [TJ By substitution of E 3 , we obtain 

'Ei =/(Ei,Ei)+/(E 2 ,E 4 ) 

E 2 = b -\- f (E 2 , a /i(E 4 )) 

E 3 = a + /i(E 4 ) 

E 4 = a + h{a + /i(E 4 )) 


X^ = 



The 4—split of a + h{a + /i(E 4 )) leads to the factorization {h{a + h{xi))) ■x 4 E + a, contracted in 
(h(a + h(x4)))*^4 a. Then, we obtain 

El =/(Ei,Ei) + /(E2,E4) 

E2 = h + /(E2, CL + /i(E 4 )) 

E 3 = a + /i(E 4 ) 

E 4 = {h{a + h{x 4 )))*^i •x 4 a 

By substitution, 

El = /(El, El) + /(E2, {h{a + h{x4)))*^4 a) 

^ E2 =b + /(E2, a + h{{h{a + h{x4)))*^4, a)) 

E 3 =a + h{{h{a + h{x4)))*^i-x^a) 

_E 4 = {h{a + h{x4)))*^‘i -xi a 

The 2—split of 6 + /(E 2 , a + h{{h{a + /i(x 4)))*"'4 a)) leads to the factorization (/(x 2 , a + h{{h{a + 

h{x4)))*^4 a))) ■x 2 E 2 + 6, contracted in {f{x 2 ,a + h{{h{a + h{x4)))*^4 -x^ a )))*^2 ft. Thus, we obtain 

the new system 

El = /(El,El) + /((/(x2,a + h{{h{a + ft.(x 4)))*"4 a )))*"2 b, {h{a + h{x4)))*^‘i -x^, a) 

E2 = (/(x2, a + h((h(a + h{x4)))*^i -x^ a)))*^^ ■x2 b 
E 3 = a + ft,((ft,(a + ft,(x 4)))*“^4 . 3 ,^ a) 

_E 4 = {h{a +h{x4)))*^i-x^a 

Finally, factorizing/contracting the hrst equation, we obtain the solution 

El = {f{xi,xi))*^c -xi (/((/(x2, a + h{{h{a + /i(x 4)))*"'4 a)))*"2 b, {h{a + /i(x 4)))*"4 a)) 

E 2 = (/(x 2 , a + h{{h{a + ft,(x 4)))*"4 a )))*"2 ft 

E3 = a +/i((/i(a +/i(x 4)))*^4 a) 

_E4 = (/i(a + ft,(x4)))*^4 a 

Any closed system admits a canonical resolution, dehned in the proof of the following theorem. 

Theorem 1. Let X = {(Ej = Fj) \ 1 < j < n} be a closed equation system over a graded alphabet E 
and over variables {Ei,... ,Efc}. Then 

X admits a regular solution over free(T’). 

Furthermore, a n-tuple of rational expressions denoting this solution can he computed. 

Proof. By recurrence over the cardinal of X. 

1. Suppose that the equation E„ = F^ is not recursive. 

(a) If n = 1, then Fi is a rational expression and therefore L(Fi) is the unique solution for X. Since 
X is closed, L{Fi) C r(free(T)). 

(b) Otherwise, consider the system X' = T’^\{E„ = Fn}. From Corollary El the system X' is closed. 

By recurrence hypothesis, X' admits a regular solution Z = (L 4 ,..., L„_i) over free(T’) denoted 
by (El, ..., E„_i). From Lemma[ 8 l this implies that (Li,..., L„_i, Lz(Fn)) is a solution for X 
that is, by construction of Z, a solution over free(T’). From Lemma [71 Lz{Fn) is denoted by 
En = (• • • {Fn)v.x^E 4 ■ ■ Oe i<-b i > ^ rational expression with no variables. Therefore X 

admits a regular solution (Li,..., Ln-i, Lz{Fn)) over free(T’) denoted by (Ei,..., En). 

2. Consider that the equation E„ = Fn is recursive. Let A:split(E„) = (E', E"). Let a be a symbol not 
in E. Let F^ = (E')^^,.;-^ E^ + F". Since X is closed, it holds from Proposition [71 that X admits 
a solution over free(T’) if and only if X' = {X\ {E„ = E„}) U {E„ = F^} does. From Corollary [ 6 l 
X' is closed. From Propositional X' admits a solution over free(T’) if and only if A/ does. From 
Lemma m Xn is closed, and contains the equation Efc = F'*‘^ -c F", that is not recursive. The 
existence of the solution is then proved from the point ([T]). 






□ 


In other words, 

Theorem 2. Any closed equation system is effectively solvable. 

5 Construction of a Rational Tree Expression from an Automaton 

In this section, we show how to extract a tree languages equations system from a given FTA A = 
{E,Q,Qf,A). Then, using the Arden’s Lemma and the transformations (contraction and substitu¬ 
tion) defined in the previous sections, we show how to resolve it and compute an equivalent rational 
expression Eg by associating with a state g in Q an equation defining L{q). Let us first recall a basic 
property of the down language of a state: 

Lemma 11. Let A = (A, Q,Qf, A) be a FTA. Let q € Q be a state. Then: 

L{q)= U f{L{qi),...,L{qn)) 

{f,qi,...,qn,q)€A 

Proof. Let us set L'{q) = \J{f,qi,...,q„,q)eA fiHQi), ■ ■ ■ Let t = f{ti ,... ,t„) be a tree in T{E). 

Let us show that t G L{q) t G L'{q). By definition, t G L{q) g G 6{t). Then: 

q G 6{t) 4^ 3{f, qi,..., qn, q) G A, (VI < i < n, qi G 6{ti)) 

3{f, qi,..., qn, q) G A, (VI < i < n, f G L{qi)) 

^ 3(/, qi,..., qn, q) e A,te /(L(gi),..., L(g„)) 
t G L'{q) 

□ 

The previous lemma can be used to define an equation system that can describe the relations 
between the down languages of the states of a given FTA. 

Let A = (A, Q, Qf, A) be a FTA with Q = {1,... , n}. The equation system associated with A is 
the set of equations A 4 over the variables Ei,... ,E„ defined by A 4 = {Eg \ q G Q} where for any 
state q in Q, Eg is the equation Eg = Fg with Fg = Yl{f qi q)eA Let us show that 

any solution of A 4 denotes the down languages of the states of A. 

Proposition 9. Let A = (A, Q,Qf, A) be a FTA with Q = {1,...,n}. Let E = {Ei, ..., En) be a 
solution of Then: 

yi<j<n,L{Ej) = L{j). 

Proof. Let t be tree over S. Let us show by induction over t that t G L{Ej) 44- t & Lj. 

1. Consider that t £ Eq. Then 

t G L{Ej) 44t £ L{ 

{f,qi,...,qn,j)&A 

44 {t,j) £ A 
44 t£ L{j) 

2. Otherwise, t = g{ti,... ,tk) and 

t £ L{Ej) 44t £ L{ E f{Eg„...,EgJ) 

{f,qi,...,q„,j)£A 

44 3{g,qi,... ,qk,j) £ A AVI < I < k,ti £ L{Eqf) 

44 3{g, qi,... ,qk,j) £ A AVI < I < k,ti £ L[qi) (induction hypothesis) 

44 t £ U, f{L{qi),...,L{qn)) 

{f,qi,...,qn,j)&A 

44 t £ L(j) 


(Lemma llip 


□ 


Since Xa is by definition closed, it holds from Theorem [2] that 
Theorem 3. Let A = (X!,Q,Qf,A) be a FTA. Then: 

Xa can he effeetively solved. 

As a direct consequence of Theorem [3] and of Proposition |9l following Equation ([T]) , 

Theorem 4. Let A = [T!, {1,... ,n},Q f, A) be a FTA. Let {Ei,..., E^) denoting a solution of X^. 
Then: 


L{A) is denoted by the rational expression 

j^Qf 

Example 6. Let us consider the FTA A in Figured) The system associated with A is the system X in 
Example dl 

'Ei =/(Ei,Ei) + /(E2,E4) 

^ ^ E 2 = b + f (E 2 , E 4 ) 

E 3 = a + /i(E 4 ) 

E 4 = a + /i(E 3 ) 

Let us apply the resolution defined in the proof of Theorem [1] We first compute A 4 : 

El =/(Ei,Ei) + /(E2,a + h(E3)) 

^4 _ ^ E 2 = 6 +/(E 2 ,a +/i(E 3 )) 

E 3 = a + h{a + h(Es)) 

E 4 = a + hfE^) 

Then we have to solve the closed subsystem 

'El =/(Ei,Ei) + /(E2,a + h(E3)) 

< E 2 =6 + /(E2,a + h(E3)) (6) 

^Es = a + h{a + hfEs)) 


The 3 — split of a + h{a + /i(E 3 )) leads to the factorization h{a + /i(x 3 )) -x^ E 3 + o, contracted in 
{h{a + h{x‘i))Y^i -xs a,. Thus, the system ([ 6 ]) is equivalent to 

'El =/(Ei,Ei) + /(E2 ,a + h(E3)) 

< E 2 = b + f {E 2 , a + h{Es)) 

^Es = {h{a + h{x3)))*^3 

and by substitution of E 3 to 

'El =/(Ei,Ei) + /(E 2 ,a + /i((h(a + /i(x3)))*“^3 .,^a)) 

< E 2 =b + /(E 2 , a + h{{h{a + h{xY))*^^ -x^ a)) 

_E3 = {h{a + h{x‘i))T"^ -tzCl 

Now, let us solve the new subsystem 

fEi =/(Ei,Ei) +/(E 2 ,a +/i((/i(a +/i(x3)))*"3 .,^3 a)) 

|E 2 =h + /(E 2 ,a + h{{h{a + h{xY)))*"^ -x^ a)) 







The 2 — split of 6 + /(E2, a + h{{h{a + h(xs )))*^3 a)) leads to the factorization (/(x2, a + h{{h{a + 

h{x ^)))*^3 ■xi<i)))'x2 ^2 + ^, contracted in {{f{x2^a + h{{h{a + h{x ^)))*^3 a ))))*^2 b. Consequently, 

the system ([7]) is equivalent to 

fEi = /(El,El) + /(E2, a + h{{h{a + h{xz )))*^3 a)) 

|e 2 = ((/(x2, a + h{{h{a + h{x 3)))*^3 a))))*“^2 h 

and by substitution to 

fEi = /(El,El) +/(((/(x 2 ,a +/i((/i(a + h(x3)))*-3 .,^3 a ))))*“^2 h,a + h{{h{a + h{x^)))*^3 a)) 

|e 2 = ((/(x 2 , a + h{{h{a + /i(x3)))*“^3 .3,3 a ))))*“^2 b 

Then, by factorization/contraction, 

El = (/(xi,xi))*"i -xi {f{{{f{x2,a + h{{h{a + h{x^))Y ^3 a))))*"2 b,a + h{{h{a + h{xY))*"^ -x^ a))) 

Finally, we obtain the solution 

El = (/(xi, xi))*“^i -xi (/(((/(x2, a + h{{h{a + h{xY)Y"^ xg “))))*"2 -,2,2 6, a + h{{h{a + h{xY)Y"^ ' 
E2 = ((/(x2, a + h{{h{a + /i(x3)))*-3 .,^,3 a))))*-2 b 
E3 = {h{a + h{x 3 ))Y"^ -xa “ 

E4 = a +/i((/i(a +/i(x 3)))*^3 .3,3 a) 

Since the hnal states are 1 and 3, it holds that L{A) is denoted by: 

(/(xi,xi))*"i -xi (/(((/(x2,a + h{{h{a + /i(x3)))*"3 a))))*"2 b,a + h{{h{a + /i(x3)))*"3 .,^3 «))) 

+ {h{a + h{x 3 ))Y"^ -xs a 



6 Conclusion 

We present a new construction of a rational expression from a tree automaton. This construction, based 
on a generalization of Arden’s Lemma, gives another way to prove Kleene’s theorem for tree. In order 
to produce the expression, we studied the notion of tree languages equation systems and determine a 
sufficient condition to solve them. The next step is to study the different links that may exist between 
the different methods of computation of an expression from an automaton, like it was studied in Ha- 
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